Backup and archiving system by means of tape volume cassettes for data processing units

ABSTRACT

A backup and archiving system by means of tape cassettes is proposed which avoids bottlenecks at a higher performance level [that may be] caused by a central working storage especially during backup and archiving procedures. Such a backup and archiving system provides a distributed hardware architecture in which several Component Computers ( 6 ) work without reciprocal obstruction.

[0001] The invention pertains to a backup and archiving system by meansof tape cassettes for data processing systems in keeping with thepreamble to claim 1.

[0002] At present, tape cassettes are the lowest-priced archiving mediumfor realizing backup and archiving systems. It is noteworthy that, onthe one hand, the current significant growth in data volume entails anincreasing number of tape cassettes (hereinafter also referred to asvolumes) addressing one host unit individually. On the other hand, thegrowth in data volume entails only a limited increase in data volume pervolume.

[0003] In addition, the principal use of backup and archiving systemscurrently takes place in an ever narrowing archiving window, since itmust not impede any application operation. Thus, any backup andarchiving system has to meet high requirements of parallelization inorder to be able to transfer a volume of data.

[0004] It is noteworthy that the actually usable transfer rates forbackup and archiving processes are, at present, markedly lower thanthose which tape technology can currently support. On the one hand, thisis due to the fact that only limited data rates can be transferred byindividual applications. Another reason is the fact that, whenever thedata streams of several applications are clustered, access to the disksystems is constrained by the data structures of the systems platforms.

[0005] Moreover, tape systems have short innovation cycles forincreasing capacity and a rate of transfer increasing by factors. One ofthe reasons is that more tracks are used to record on one tape. Anotherreason is that the degree of data compression is increasing.

[0006] Furthermore, manual operation of peripheral tape devices is moreand more being automated by robot systems called stackers and ATLs(Automatic Tape Libraries).

[0007] Finally, a greater tendency to centralize archiving in computercenters is discernible, in an effort to do so in clusters of severalplatforms.

[0008] The above-described development trends bring about the problemsdetailed below.

[0009] For example, the average recording quantity (filling ratio) ofthe tapes decreases. Studies have revealed that, on average, less than20% of the tapes are filled. Compounded with respect to the newtechnologies, this filling ratio threatens to drop to as little as 1%.The volumes are therefore filled only partially, and therefore againuneconomically.

[0010] The transfer rates of the cassette drives are not fully used andrespresent an unused potential.

[0011] The number of tapes/cassettes rises disproportionally, requiringappropriate shelf space, and leads to high cost.

[0012] Increased cost for shelf space in a robot, compared with aconventional shelf, exacerbates the cost problem.

[0013] The cassette drives are only fully used in phases; in otherwords, their use is uneconomical.

[0014] Since a high degree of parallelization is required for short-termpeak loads, the number of cassette drives must, in addition, beincreased even though the drives are rarely in use. In other words, thecost of investment is additionally increased by technical exigencies.

[0015] In today's tape technology, obstructions continue to occur duringoperation, brought about by extended mount and positioning times. Forexample, such obstructions may occur during a Reclaim, so called, ofarchived data for the purpose of reading or update.

[0016] As a rule, it is not possible to make adjustments to the hostsystems for optimal use of the tape and drive technology becausefunctional adjustments within the applications are too costly.

[0017] Due to the incompatibility between various manufacturers andbetween generations of drives, problems arise to the user when expansionbecomes necessary and when new technologies are used.

[0018] Previous approaches to solving these problems may be divided intotwo categories. One category pertains to individual solutions, while theother category pertains to integrated solutions.

[0019] A first individual solution may be referred to as “ATL”(Automatic Tape Libraries). ATL systems enable manual operation of thetape devices to be automated. Besides reducing the need for manuallabor, the operation becomes more dependable and safer, and mount timesare shortened because they proceed mechanically. Due to centralizationfor reasons of cost, an ATL is typically used jointly by several hosts.Thus, tape cassettes may frequently be used jointly by several systemsregardless of the systems platform.

[0020] Another individual solution may be described as “virtualvolumes”. In this solution, several volumes, viewed by the host asindependently named volumes or cassettes, are embodied on one singlephysical volume. This increases the storage capability of the physicalvolumes (tape cassettes), so that fewer tape cassettes need to stand by.The specific properties of a new drive and the volumes run on it are nolonger visible to the host. Therefore, adjustments no longer arenecessary for operating the host, since the adjustments are captured byvirtualization whenever transition is made to a new generation. In otherwords, the adjustments are accomplished in a software-drivenvirtualization stratum.

[0021] Yet another individual solution is temporary storage of data. Thedata are temporarily stored in a volume cache. In other words, entirevirtual volumes are saved in a disk memory, in order to allow immediatewriting (without prior mount time) and faster reading (without mount andpositioning times). Retrieval and storage from a virtual volume to thevolume cache may then take advantage of the physical tape transfer rate.In other words, the performance requirements of archiving may be metwith fewer drives than in instances in which the host accesses the tapecassette drives directly. At the host interface, the number of availablevirtual cassette drives may be greater than available physicallyinstalled cassette drives.

[0022] Variable mechanisms are used for optimized management of thetemporary disk memory, so that advance reservations etc. are possible.They control the exact time of secondary data transfer, i.e. the pointat which a virtual volume is transferred between volume cache andphysical cassettes.

[0023] In order to prevent data losses when errors occur in temporarymemory, steps are taken to make the disks failsafe. The use of RAIDdisks, so called, is an example of such steps.

[0024] Volume caching is typically superimposed on a standard filesystem. A UNIX file system is an example of such a standard file system.This file system also contains data such as label contents from thevirtual volumes being managed, as well as meta information. Metainformation may, for example, be information indicating which virtualvolume resides in which physical tape cassette, etc.

[0025] An appropriately dimensioned volume cache may, given a shortarchivation time window such as a 2-hour time window, achieve a nearlycontinuous optimal load for the cassette drives over 24 hours with highdata traffic between host and disk.

[0026] In an integrated solution, virtualization, caching and operationare simultaneously achieved in any system via an ATL. Appropriateprocessing capacities are autonomously accomplished by a system ofarchitecture visually presented in FIG. 1 in detail and hereinafterreferred to as Architecture Model M1.

[0027] The integrated solution in accordance with Architecture Model M1arises as a natural approach to solve the problems described above. Onthe other hand, Architecture Model M1 leads to new problems.

[0028] The system in accordance with Architecture Model M1 may itselfbecome a bottleneck in the case of certain configurations. Thescaleability of any system is, therefore, already too narrow for currentinstallations.

[0029] It is furthermore problematic that guarantees versus the hostregarding transfer rates are by now possible only up to a point, due tothe system's complex internal processes with reciprocal obstruction.Particularly, operational obstructions may arise due to internal systemsreorganization processes when, for example, a tape containing few datais to be fully used again because of a cassette recycling.

[0030] The scant use of the tapes is due, on the one hand, to additionaldata being regularly added to the end of the tape while, on the otherhand, invalid data can be so marked but cannot be deleted from the tape.The obstructed space on the tapes can only be recaptured by areorganization, i.e. by selecting, temporarily storing and then writingthe desired data to a reformatted tape. This entails an enormousadditional burden on the CPU, especially, and on the bus system of thearchiving systems, with the effect that overall performance of thesystems declines even further.

[0031] It is also problematic that the system additionally represents anew danger. Additional efforts are required to avoid systems failurewhen individual components fail.

[0032] To avoid the bottleneck in the data transfer, multiprocessors andmultibus systems are used. However, all of the data transferred betweenhost and disks, and all of the data transferred between disks and tapecassettes have to be moved via the CPUs' working storage, as the dataformats, such as the blocking of data, header information etc., differbetween host, caching disk and tape cassettes. Therefore, it turns outthat the rate of transfer to the CPUs' working storage is the limitingfactor for the data transfer of the entire system. This is true in equalmeasure for multiprocessor systems.

[0033] This potential bottleneck limits the scaleability of systemsfollowing Architecture Model M1 and may force the user to operateseveral systems with consequently separate data quantities. This resultsin organizational problems for the user, such as the need forreorganizing his internal work processes.

[0034] It is a particular problem that it is not possible to guaranteetransfer rates versus the hosts. Indirectly launched transfers betweenthe volume cache and physical tape cassettes obstruct the data trafficbetween host and volume cache, since they both have to be reformattedvia the working storage. Resulting fluctuations of the transfer ratesavailable to the host may require for their avoidance a reserve capacityin the system which cannot otherwise be provided. Attempts to copy fromone cassette drive directly onto another without going through thevolume cache, while reducing load on the working storage, require anadditional physical drive. This renders the external control of theinternal optimization even more difficult.

[0035] Failsafe dual systems in accordance with Architecture Model M1have not appeared on the market so far. Since they involve additionalcoordination efforts, they may also entail additional bottlenecks andcontrol problems.

[0036] It is the task of the invention at hand to specify a backup andarchiving system of the type described above, in which a bottleneck dueto a central working storage is avoided, especially in backup andarchiving procedures in higher performance contexts.

[0037] This task has been solved by a backup and archiving system havingthe characteristics of claim 1, thus achieving a distributed hardwarearchitecture in which several component computers work withoutreciprocal interference. This avoids resorting to a bottleneck-causingcentral hardware element for data conversion, such as the centralworking storage in an Architecture Model M1 as described above. Inaccordance with the distributed hardware architecture, severalautonomous working memories, i.e. one per component computer, areavailable to handle data conversion.

[0038] Moreover, this distributed hardware architecture has theadvantage of enabling an increased number of component computers toraise the overall performance of the backup and archiving system beyondthe performance spectrum required today. The addition of further dataentries and/or cassette drives can be handled by simply adding morecomponent computers.

[0039] It is also of advantage that software components required forrealizing the backup and archiving system are scaleable. The hardwarebasis is of no importance; i.e., it does not matter whether the systemis based on a single processor or on multiple processors.

[0040] The device in this invention permits fundamentally higher overallrates of transfer than prior architectures. Should additionalperformance increases become necessary, they can be realized because thecapacity all available interface media is expandable beyond currentlyforeseeable neeeds, by means of multiplication.

[0041] Moreover, it is of advantage—whenever guarantees for transferrates versus the hosts are required—that needless reserve capacities areunnecessary.

[0042] Furthermore, complex special solutions are avoided for relievingthe bottleneck of data conversion in main storage for recyclingcassettes.

[0043] Finally, it is of advantage that the use of standard hardwarecomponents for the first, second and third functional unit is alsofeasible for peak performance levels, which leads to lower overall cost.

[0044] Advantageous configurations of the invention are described insubordinate claims.

[0045] In accordance therewith, LAN, SCSI and FC connection structuresmay be used for fast data transfer.

[0046] If several component computers and cassette drives or hosts areinterconnected via appropriate multiple connectors, the componentcomputers may be reciprocally used as substitute computers at no extracost, making the system highly failsafe. This does not impinge on thecoordination of normal operation.

[0047] An additional increase in the system's performance is achievedthrough multiple layout of hardware components needed for access to thevolume cache, for communication regarding access to the volume cache,and for communication regarding management tasks between the componentcomputers, because bottlenecks which might otherwise arise at thesepoints are avoided.

[0048] Below, one sample implementation of the invention is explained ingreater detail, using a drawing. It comprises:

[0049]FIG. 1 a schematic of a backup and archiving system with onestate-of-the-art integrated volume cache.

[0050]FIG. 2 a schematic of a backup and archiving system with adistributed hardware architecture in accordance with the invention.

[0051]FIG. 3 a schematic of the software architecture of a backup andarchiving system in accordance with FIG. 2; and

[0052]FIG. 4 a schematic summary of the hardware and softwarearchitecture in accordance with FIGS. 2 and 3 regarding a generalVirtual Tape Library System in accordance with the invention as shown inFIG. 2.

[0053] The backup and archiving system by means of tape cassettes fordata processing systems shown in FIG. 1 is connected to a Host Unit 1.Optionally, a second host unit, or additional Host Units 1 may beincluded. The backup and archiving system according to FIG. 1 has atleast one Cassette Drive 2 for tape cassettes. Moreover, a Disk MemorySubsystem 3 is included which comprises at least one Disk Memory Unit 4.For mutual data-technical interface of existing Hosts 1, Cassette Drive2, and Disk Memory Subsystem 3, a data-technical Interface Unit 5 hasbeen inserted. As also shown in FIG. 1, data-technical Interface Unit 5consists of a single computer with Bus System 17. The single computerconsists of one or several CPUs (Central Processing Units), i.e., of oneor several central processors which, together with a Central WorkingStorage 16, process the data transfers between the Hosts 1, the CassetteDrives 2 and the Disk Memory Subsystem 3. For this purpose, the CPUs andthe Central Working storage 15 are connected to Bus System 17, to whichHosts 1, the Cassette Drive 2, and the Disk Memory Subsystem 3 arelikewise connected. Data conversions necessitated by the backup andarchiving processes take place via the central working storage.

[0054]FIG. 2 shows a backup and archiving system in accordance with theinvention which is based on a virtually distributed hardware andsoftware architecture. Analogously to the architecture of the backup andarchiving system shown in FIG. 1, which has also been calledArchitecture Model M1, the architecture of the backup and archivingsystem shown in FIG. 2 may be regarded as Architecture Model M2.

[0055] Just like the backup and archiving system shown in FIG. 1, thebackup and archiving system shown in FIG. 2 is connected to one orseveral Hosts 1 and one or several Cassette Drives 2. The data fromHosts 1 are presented at Data Ports 15. Furthermore, a Disk MemorySubsystem 3 with at least one Disk Memory Unit 4 is present as part of adata-technical Interface Unit 5. As in FIG. 1, the data-technicalInterface Unit 5 is connected to the Hosts 1 and the Cassette Drives 2.

[0056] In contrast with FIG. 1, discrete Function Units 11 and 12 areprovided within the data-technical Interface Unit of FIG. 2, each havingseveral Component Computers 6 with respective CPUs and working memories,for handling the data-technical processes required for backup andarchiving procedures. A second Function Unit 11 transfers the datareceived from at least one Data Port 15 to Disk Memory Subsystem 3,while a third Function Component is provided for transfer of the datatemporarily stored on Disk Memory Subsystem 3 to at least one CassetteDrive 2. A first Function Component 10 coordinates and controls the dataflow between Data Ports 15, the Cassette Drives 2, and Disk MemorySubsystem 3. In the sample implementation shown in FIG. 2, FunctionUnits 11 and 12 are realized by two Component Computers 6 each, whichare each connected to Disk Memory Subsystem 3. Moreover, a few of theComponent Computers 6 are each connected to at least one Host 1, for thepurpose of handling the data transfer to the host side. Moreover, a fewother Component Computers 6 are connected, in the direction of thecassette drive side, to one Cassette Drive 2 each. The number ofComponent Computers 6 may be variably chosen.

[0057] Disk Memory Subsystem 3 is sometimes referred to as volume cachein the description.

[0058]FIG. 4 shows the facts of FIG. 2 in greater detail. In FIG. 4,Component Computers 6 are interconnected by an appropriate firstInterface Element 7 for the purpose of data-technical exchange ofinformation.

[0059] In the sample implementation, this first Interface Element 7 isrealized by a LAN (Local Area Network), i.e. by a local network. A localnetwork is a hardware- and software- related conjunction of computersinto a functional system.

[0060] As FIG. 4 also shows, appropriate Component Computers 6 areconnected, for a data transfer between the appropriate ComponentComputers 6 and the Disk Memory Subsystem 3 or the cassette drives, toDisk Memory Subsystem 3 or the Cassette Drives 2 by an appropriate andfast second Interface Element (8; 9). In particular, Interface Element 8is realized between the appropriate Component Computers 6 and DiskMemory Subsystem 3 by means of an FC technology (Fibre Channel), andInterface Element 9 is realized between the appropriate ComponentComputers 6 and the Cassette Drives 2 by means of an SCSI technology(Small Computer System Interface). Interface Element 8 between theappropriate Component Computers 6 and Disk Memory Subsystem 3 might, ina different sample implementation, also be realized by means of an FC-ALtechnology (Fibre Channel-Arbitrated Loops). The FC technology permitsbridging great distances up to 10 km.

[0061] In the backup and archiving system shown in FIG. 2 or 4, adistributed file system has been realized with a coordinating functionfor access to files in this file system by internal processes runningdistributed or not distributed processes on Component Computers 6.Communication of these processes takes place via the first InterfaceElement 7 located between Component Computers 6.

[0062] Among the processes there are processes which are accomplished byfirst Function Components 10 (FIG. 4) and which realizee a strategyfunction by which decisions regarding data placement and regarding thetime of their retrieval and storage in the disk memory subsystem aretriggered. These processes will be abbreviated as VLP (Virtual LibraryProcess) below.

[0063] Among the above-mentioned processes, there are furthermoreprocesses which are accomplished by second Function Components 11 (FIG.4) and which realize access from the host units to Disk Memory Subsystem3. These processes will be abbreviated as ICP (Internal Channel Process)below.

[0064] Finally, among the above-mentioned processes, there are processeswhich are accomplished by third Function Components 12 (FIG. 4) andwhich control the data transfer between Disk Memory Subsystem 3 andCassette Drives 2. These processes will be abbreviated as IDP (InternalDevice Process) below.

[0065] Disk Memory Subsystem 3 in FIGS. 2 and 4 has been realized on thebasis of a RAID system (Redundant Array of Independent Disks) such asRAID1 and/or RAID3.

[0066]FIG. 3 shows an overview of the software architecture realized onthe Virtual Tape Library System (VTLS) shown in FIGS. 2 and 4. It isbased on a systems basis with communication mechanisms and operatorinterfaces.

[0067] In accordance with the sample implementation, especially as inFIG. 4, a UNIX system serves as a systems basis. For joining theComponent Computers 6 to the rest of the system, Standard PeripheralChannel Connectors (SPCC) were used. The SPCCs are a Siemens product(e.g., Channel Adapter 3970). They are special high-performance adapterswhich bring about the physical connection with the appropriateindividual system components.

[0068] On this systems basis, UNIX downward-compatibly establishes thedistributed file system with a file block structure suitable for tapeoperation.

[0069] At the next higher stratum, the various processes ICP, IDP andVLP have been realized parallel of one another. VLP manages the cachecatalog and coordinates file access.

[0070] At the top stratum, the host connections, magnetic tapeconnections (connections to the cassette drives) and robot connections(connections to the tape storage units) have been realized.

[0071]FIG. 4 shows a typical hardware-software configuration of a VTLScontaining five Component Computers 6. Each Component Computer 6contains an SPCC system as a basis for the software strata imposed overit (FIG. 3). For greater protection against failure, the ExternalConnections 14 have been doubled. This permits ICP1 and ICP2 to take oneach other's tasks. The VLP runs on an autonomous Component Computer 6.If this Component Computer 6 fails, the VLP is restarted on anotherComponent Computer 6, such as IDP1. Since all of the data regarding theprocessing status of the Virtual Volumes and the physical cassettes arestored in the failsafe Disk Memory Subsystem 3, the restarted process isable to continue the interrupted process after a brief delay.

[0072] The channel connections to Hosts 1 may be realized by failsafenetworked ESCON channel connections (Enterprise Systems CONnection).ESCON technology is an IBM product.

Patent claims
 1. Backup and archiving system by means of tape cassettesfor data processing systems with at least one Data Port (15) forreceiving data to be stored in memory, at least one Cassette Drive (2)for tape cassettes, one Interface Unit (5) for connecting at least onedata port with at least one Cassette Drive (2), comprising a Disk MemorySubsystem (3) with at least one Disk Memory Unit (4) for temporarystorage of data to be secured on the tape cassettes, characterized by asecond Function Unit (11) designed to transfer the data received by atleast one Data Port (15) to the Disk Memory Subsystem (3), a thirdFunction Unit (12) designed to transfer the data temporarily stored onthe Disk Memory Subsystem (3) to at least one Cassette Drive (2), and afirst Function Unit (10) designed to monitor the second and thirdFunction Unit (11, 12) in order to check the processing and the accessto the Disk Memory Subsystem (3), wherein at least one ComponentComputer (6) each is being provided for the first, second and thirdFunction Unit (10, 11, 12).
 2. Backup and archiving system in accordancewith claim 1, characterized by the fact that for each Data Port (15)and/or each Cassette Drive (2) one Component Computer (6) is providedwhich may be separately accessed by the first Function Unit (10). 3.Backup and archiving system in accordance with claim 1, characterized bythe fact that the third Function Unit (12) comprises several ComponentComputers (6), each of the Component Computers (6) being connected to aCassette Drive (2) for tape cassettes, and each Component Computer (6)separately accessible by the first Function Unit (10).
 4. Backup andarchiving system in accordance with claim 2 or 3, characterized by thefact that each Component Computer (6) comprises a microprocessor and aworking storage as well as a bus system.
 5. Backup and archiving systemin accordance with claim 1 or 2, characterized by the fact that theconnection between the second and third Function Unit (11, 12) and theDisk Memory Subsystem (3) is realized via a Bus System (8) in an FCand/or an SCSI connection structure.
 6. Backup and archiving system inaccordance with any of the foregoing claims, characterized by the factthat, for mutual substitute computer service, multiple connectors areprovided between several Component Computers (6) and the CassetteDrives(2) and/or the Hosts (1) submitting data at the Data Ports (15). 7.Backup and archiving system in accordance with any of the foregoingclaims, characterized by the fact that a multiple layout is provided ofat least one of the hardware components required for access to DiskMemory Subsystem (3), for communication regarding access to Disk MemorySubsystem (3), and for communication regarding management tasks betweenthe Component Computers (6).